Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

نویسندگان

  • Barnabás Póczos
  • Liang Xiong
  • Jeff G. Schneider
چکیده

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multivariate f-divergence Estimation With Confidence

The problem of f -divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a r...

متن کامل

Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning

Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution ...

متن کامل

Active Learning for Probability Estimation Using Jensen-Shannon Divergence

Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce no...

متن کامل

Bias Reduction and Metric Learning for Nearest-Neighbor Estimation of Kullback-Leibler Divergence

Asymptotically unbiased nearest-neighbor estimators for KL divergence have recently been proposed and demonstrated in a number of applications. With small sample sizes, however, these nonparametric methods typically suffer from high estimation bias due to the non-local statistics of empirical nearest-neighbor information. In this paper, we show that this non-local bias can be mitigated by chang...

متن کامل

Nonparametric Kernel Estimation and Regression on Distributions

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. We wish to expand the domain of consideration and let each instance correspond to a continuous probability distribution ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011